Haptic technologies in both kinesthetic and tactile aspects benefit a brand-new opportunity to recent human-machine interactive applications. In this talk, I, who believe in that one of the essential role of a researcher is pioneering new insights and knowledge, will present my previous research topics about haptic technologies and human-machine interactive applications in two branches: laser-based mid-air haptics and sensorimotor skill learning. For the former branch, I will introduce our approach named indirect laser radiation and its application. Indirect laser radiation utilizes a laser and a light-absorbing elastic medium to evoke a tapping-like tactile sensation. For the latter, I will introduce our data-driven approach for both modeling and learning of sensorimotor skills (especially, driving) with kinesthetic assistance and artificial neural networks; I call it human-like haptic assistance. To unify two different branches of my earlier studies for exploring the feasibility of the sensory channel named "touch", I will present a general research paradigm for human-machine interactive applications to which current haptic technologies can aim in future.
Organizers: Katherine J. Kuchenbecker
Needle insertion is the most essential skill in medical care; training has to be imparted not only for physicians but also for nurses and paramedics. In most needle insertion procedures, haptic feedback from the needle is the main stimulus that novices are to be trained in. For better patient safety, the classical methods of training the haptic skills have to be replaced with simulators based on new robotic and graphics technologies. The main objective of this work is to develop analytical models of needle insertion (a special case of epidural anesthesia) including the biomechanical and psychophysical concepts that simulate the needle-tissue interaction forces in linear heterogeneous tissues and to validate the model with a series of experiments. The biomechanical and perception models were validated with experiments in two stages: with and without the human intervention. The second stage is the validation using the Turing test with two different experiments: 1) to observe the perceptual difference between the simulated and the physical phantom model, and 2) to verify the effectiveness of perceptual filter between the unfiltered and filtered model response. The results showed that the model could replicate the physical phantom tissues with good accuracy. This can be further extended to a non-linear heterogeneous model. The proposed needle/tissue interaction force models can be used more often in improving realism, performance and enabling future applications in needle simulators in heterogeneous tissue. Needle insertion training simulator was developed with the simulated models using Omni Phantom and clinical trials are conducted for the face validity and construct validity. The face validity results showed that the degree of realism of virtual environments and instruments had the overall lowest mean score and ease of usage and training in hand – eye coordination had the highest mean score. The construct validity results showed that the simulator was able to successfully differentiate force and psychomotor signatures of anesthesiologists with experiences less than 5 years and more than 5 years. For the performance index of the trainees, a novel measure, Just Controllable Difference (JCD) was proposed and a preliminary study on JCD measure is explored using two experiments for the novice. A preliminary study on the use of clinical training simulations, especially needle insertion procedure in virtual environments is emphasized on two objectives: Firstly, measures of force JND with the three fingers and secondly, comparison of these measures in Non-Immersive Virtual Reality (NIVR) to that of the Immersive Virtual Reality (IVR) using psychophysical study with the Force Matching task, Constant Stimuli method, and Isometric Force Probing stimuli. The results showed a better force JND in the IVR compared to that of the NIVR. Also, a simple state observer model was proposed to explain the improvement of force JND in the IVR. This study would quantitatively reinforce the use of the IVR for the design of various medical simulators.
Organizers: Katherine J. Kuchenbecker
Functional polymers can be easily tailored for their interaction with living organismes. In our Group, we have worked during the last 15 years in the development of this kind of polymeric materials with different funcionalities, high biocompatibility and in different forms. In this talk, we will describe the synthesis of thermosensitive thin films that can be used to prevent biofilm formation in medical devices, the preparation of biodegradable polymers specially designed for vectors for gene transfection and a new familliy of zwitterionic polymers that are able to cross intestine mucouse for oral delivery applications. The relationship between structure-functionality- applications will be discussed for every example.
Organizers: Metin Sitti
Since Hubel and Wiesel's seminal findings in the primary visual cortex (V1) more than 50 years ago, progress in vision science has been very limited along previous frameworks and schools of thoughts on understanding vision. Have we been asking the right questions? I will show observations motivating the new path. First, a drastic information bottleneck forces the brain to process only a tiny fraction of the massive visual input information; this selection is called the attentional selection, how to select this tiny fraction is critical. Second, a large body of evidence has been accumulating to suggest that the primary visual cortex (V1) is where this selection starts, suggesting that the visual cortical areas along the visual pathway beyond V1 must be investigated in light of this selection in V1. Placing attentional selection as the center stage, a new path to understanding vision is proposed (articulated in my book "Understanding vision: theory, models, and data", Oxford University Press 2014). I will show a first example of using this new path, which aims to ask new questions and make fresh progresses. I will relate our insights to artificial vision systems to discuss issues like top-down feedbacks in hierachical processing, analysis-by-synthesis, and image understanding.
The amount of digital video content available is growing daily, on sites such as YouTube. Recent statistics on the YouTube website show that around 48 hours of video are uploaded every minute. This massive data production calls for automatic analysis.
In this talk we present some recent results for action recognition in videos. Bag-of-features have shown very good performance for action recognition in videos. We briefly review the underlying principles and introduce trajectory-based video features, which have shown to outperform the state of the art. These trajectory features are obtained by dense point sampling and tracking based on displacement information from a dense optical flow field. Trajectory descriptors are obtained with motion boundary histograms, which are robust to camera motion. We, then, show how to integrate temporal structure into a bag-of-features based on an actom sequence model. Action sequence models localize actions based on sequences of atomic actions, i.e., represent the temporal structure by sequences of histograms of actom-anchored visual features. This representation is flexible, sparse and discriminative. The resulting actom sequence model is shown to significantly improve performance over existing methods for temporal action localization.
Finally, we show how to move towards more structured representations by explicitly modeling human-object interactions. We learn how to represent human actions as interactions between persons and objects. We localize in space and track over time both the object and the person, and represent an action as the trajectory of the object with respect to the person position, i.e., our human-object interaction features capture the relative trajectory of the object with respect to the human. This is joint work with A Gaidon, V. Ferrari, Z. Harchaoui, A. Klaeser, A. Prest, H. Wang.
The supervision of public spaces aims at multiple objectives, such as early acquisition of targets, their identification and pursuit throughout the supervised area. To achieve these, typical sensors such as pan-tilt-zoom cameras need to either focus on individuals, or provide a broad field of view, which are conflicting control settings. We address this problem in an information-theoretic manner: by phrasing each of the objectives in terms of mutual information, they become comparable. The problem turns into maximisation of information, which is predicted for the next time step and phrased as a decision process.
Our approach results in decisions that on average satisfy objectives in desired proportions. At the end of the talk I will address an application of information maximisation to aid in the interactive calibration of cameras.
Recovering the depth of a scene is important for bridging the gap between the real and the virtual world, but also for tasks such as segmenting objects in cluttered scenes. Very cheap single view depth imaging cameras, i.e. Time of Fight cameras (ToF) or Microsoft's Kinect system, are entering the mass consumer market. In general, the acquired images have a low spatial resolution and suffer from noise as well as technology specific artifacts. In this talk I will present algorithmic solutions to the entire depth imaging pipeline, ranging from preprocessing to depth image analysis. For enhancing image intensity and depth maps, a higher order total variation based approach has been developed which exhibits superior results as compared to current state-of-the-art approaches. This performance has been achieved by allowing jumps across object boundaries, computed both from the image gradients and the depth maps. Within objects, staircasing effects as observed in standard total variation approaches is circumvented by higher order regularization. The 2.5 D motion or range flow of the observed scenes is computed by a combined global-local approach.
Particularly on Kinect-data, best results were achieved by discarding information on object edges. These are prone to errors due to the data acquisition process. In conjunction with a calibration procedure, this leads to very accurate and robust motion estimation. On these computed range flow data, we have developed the estimation of robust, scale- and rotation-invariant features. These make it feasible to use our algorithms for a novel approach to gesture recognition for man-machine interactions. This step is currently work inprogress and I will present very promising first results.
For evaluating the results of our algorithms, we plan to use realistic simulations and renderings. We have made significant advances in analyzing the feasibility of these synthetic test images and data. The bidirectional reflectance distribution function (BRDF) of several objects have been measured using a purpose-build “light-dome” setup. This, together with the development of an accurate stereo-acquisition system for measuring 3D-objects lays the ground work for performing realistic renderings. Additionally, we have started to create a test-image database with ground truth for depth, segmentation and light-field data.
3D scanning of moving objects has many applications, for example, marker-less motion capture, analysis on fluid dynamics, object explosion and so on. One of the approach to acquire accurate shape is a projector-camera system, especially the methods that reconstructs a shape by using a single image with static pattern is suitable for capturing fast moving object. In this research, we propose a method that uses a grid pattern consisting of sets of parallel lines. The pattern is spatially encoded by a periodic color pattern. While informations are sparse in the camera image, the proposed method extracts the dense (pixel-wise) phase informations from the sparse pattern.
As the result, continuous regions in the camera images can be extracted by analyzing the phase. Since there remain one DOF for each region, we propose the linear solution to eliminate the DOF by using geometric informations of the devices, i.e. epipolar constraint. In addition, solution space is finite because projected pattern consists of parallel lines with same intervals, the linear equation can be efficiently solved by integer least square method.
In the experiments, a scanning system that can capture an object in fast motion has been actually developed by using a high-speed camera. In the experiments, we show the sequence of dense shapes of an exploding balloon, and other objects at more than 1000 fps.
Fitting statistical 2D and 3D shape models to images is necessary for a variety of tasks, such as video editing and face recognition. Much progress has been made on local fitting from an initial guess, but determining a close enough initial guess is still an open problem. One approach is to detect distinct landmarks in the image and initialize the model fit from these correspondences. This is difficult, because detection of landmarks based only on their local appearance is inherently ambiguous, making it necessary to use global shape information for the detections. We propose a method to solve the combinatorial problem of selecting out of a large number of candidate landmark detections the configuration which is best supported by a shape model.
Our method, as opposed to previous approaches, always finds the globally optimal configuration. The algorithm can be applied to a very general class of shape models and is independent of the underlying feature point detector.
This talk concerns the use of physics-based models for human pose tracking and scene inference. We outline our motivation for physics-based models, some results with monocular pose tracking in terms of biomechanically inspired controllers, and recent results on the inference of scene interactions. We show that physics-based models facilitate the estimation of physically plausible human motion with little or no mocap data required. Scene interactions play an integral role in modeling sources of external forces acting on the body.
In spite of the significant effort that has been devoted to the core problems of object and action recognition in images and videos, the recognition performance of state of the art algorithms is well below what would be required for any successful deployment in many applications. Additionally, there are challenging combinatorial problems associated with constructing globally “optimal” descriptions of images and videos in terms of potentially very large collections of object and action models. The constraints that are utilized in these optimization procedures are loosely referred to as “context.” So, for example, vehicles are generally supported by the ground, so that an estimate of ground plane location parameters in an image constrains positions and apparent sizes of vehicles. Another source of context are the everyday spatial and temporal relationships between objects and actions; so, for example, keyboards are typically “on” tables and not “on” cats.
The first part of the talk will discuss how visually grounded models of object appearance and relations between objects can be simultaneously learned from weakly labeled images (images which are linguistically but not spatially annotated – i.e., we are told there is a car in the image, but not where the car is located).
Next, I will discuss how these models can be more efficiently learned using active learning methods. Once these models are acquired, one approach to inferring what objects appear in a new image is to segment the image into pieces, construct a graph based on the regions in the segmentation and the relationships modeled, and then apply probabilistic inference to the graph. However, this typically results in a very dense graph with many “noisy” edges, leading to inefficient and inaccurate inference. I will briefly describe a learning approach that can construct smaller and more informative graphs for inference.
Finally, I will relax the (unreasonable) assumption that one can segment an image into regions that correspond to objects, and describe an approach that can simultaneously construct instances of objects out of collections of connected segments that look like objects, while also softly enforcing contextual constraints.
Organizers: Michel Besserve
Human pose estimation from monocular images is one of the most challenging and computationally demanding problems in computer vision. Standard models such as Pictorial Structures consider interactions between kinematically-connected joints or limbs, leading to inference quadratic in the number of pixels.
As a result, researchers and practitioners have restricted themselves to simple models which only measure the quality of limb-pair possibilities by their 2D geometric plausibility. In this talk, we propose novel methods which allow for efficient inference in richer models with data-dependent interaction cliques.
First, we introduce structured prediction cascades, a structured analog of binary cascaded classifiers, which learn to focus computational effort where it is needed, filtering out many states cheaply while ensuring the correct output is unfiltered.
Second, we propose a way to decompose models of human pose with cyclic dependencies into a collection of tree models, and provide novel methods to impose model agreement. These techniques allow for sparse and efficient inference on the order of minutes per image or video clip.
As a result, we can afford to model pairwise interaction potentials much more richly with data-dependent features such as contour continuity, segmentation alignment, color consistency, optical flow and more.
Finally, we apply these techniques to higher-order cliques, extending the idea of poselets to structured models. We show empirically that these richer models are worthwhile, obtaining significantly more accurate pose estimation on popular datasets.
Organizers: Michel Besserve
Pose estimation and tracking has been a focus of computer vision research for many years. Despite many successes, however, most approaches to date are still not able to recover physically realistic (natural looking) 3d motions and are restricted to captures indoors or with simplified backgrounds. In the first part of this talk, I will briefly introduce a class of models that use physics to constrain the motion of the subject to more realistic interpretations.
In particular, we formulate the pose tracking problem as one of inference of control mechanisms which implicitly (through physical simulation) generate the kinematic motion matching the image observations. This formulation of the problem has a number of benefits with respect to more traditional kinematic models. In the second part of the talk, I will describe a new proof-of-concept framework for capturing human motion in outdoor environments where traditional motion capture systems, including marker-less motion systems, would typically be inapplicable.
The proposed system consists of a number of small body-mounted cameras, placed on all major segments of the body, and is capable of recovering the underlying skeletal motion by observing the scene as it changes, within each camera view, with the motion of the subjects’ body.
Organizers: Michel Besserve
Shape analysis aims to describe either a single shape or a population of shapes in an efficient and informative way. This is a key problem in various applications such as mesh deformation and animation, object recognition, and mesh parameterization.
I will present a number of approaches to process shapes that are nearly isometric. The first approach computes the correspondence information between a population of shapes in this setting. Second and third are approaches to morph between two shapes and to segment a population of shapes into near-rigid components. Next, I will present an approach for isometry-invariant shape description and feature extraction.
Furthermore, I will present an algorithm to compute the correspondence information between human bodies in varying postures. In addition to being nearly isometric, human body shapes share the same geometric structure, and we can take advantage of this prior geometric information to find accurate correspondences. Finally, I will discuss some applications of shape analysis in computer-aided design.